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Abstract 

A filter based on the Hankel Lanczos Singular Value Decomposition (HLSVD) tech- 
nique is presented and applied for the first time to X-ray diffraction (XRD) data. 
Synthetic and real powder XRD intensity profiles of nanocrystals are used to study 
the filter performances with different noise levels. Results show the robustness of 
the HLSVD filter and its capability to extract easily and efficiently the useful crys- 
tallographic information. These characteristics make the filter an interesting and 
user-friendly tool for processing XRD data. 
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1 Introduction 



In many applications of X-ray diffraction (XRD) techniques to the study of 
crystal properties, a key step in the data processing chain is an effective and 
adaptive noise filtering [1,2,3,4]. A correct noise removal can facilitate the 
separation of the useful crystallographic information from the background 
signal, and the estimation of crystal structure and domain size. Important 
issues of XRD data filtering are performances in noise suppression, capability 
to preserve the peak position, computational cost and, finally, the possibility 
of being used as a blackbox tool. Different digital filters have been applied 
to XRD data, in spatial and frequency domains. Simple procedures are based 
on polynomial filtering (and fitting) in the spatial domain [1]. A standard 
practice when working in frequency domain is to use Fourier smoothing. It 
consists in removing the high-frequency components of the spectrum [5] . Since 
the truncation of high-frequency components can be problematic in the case 
of high level noise, a different approach based on the Wiener Fourier (WF) 
filter has been proposed to clean XRD data [6]. A different approach, which 
makes use of the singular value decomposition (SVD), has been successfully 
applied to time-resolved XRD data to reduce noise level [3,4]. 

In this work we describe an application of the Hankel Lanczos Singular Value 
Decomposition (HLSVD) algorithm to filter XRD intensity data. The pro- 
posed filter is based on a subspace-based parameter estimation method, called 
Hankel Singular Value Decomposition (HSVD) [7], which is currently applied 
to Nuclear Magnetic Resonance spectroscopy data for solvent suppression [8]. 
The HSVD method computes a "signal" subspace and a "noise" subspace by 
means of the SVD of the Hankel matrix H, whose entries are the noisy signal 
data points. Its computationally most intensive part consists of the computa- 
tion of the SVD of the matrix H. Recently, several improved versions of the 
algorithm have been developed in order to reduce the needed computational 
time [8]. In this paper, we choose to apply the HSVD method based on the 
Lanczos algorithm with partial reorthogonalization (HLSVD-PRO), which is 
proved to be the most accurate and efficient version available in the literature. 
A comparison in terms of numerical reliability and computational efficiency 
of HSVD with its Lanczos-based variants can be found in Ref . [8] . 

A criterion is presented to facilitate the separation of noise from the useful 
crystallographic signal. It enables the design of a blackbox filter to be used 
in the processing of XRD data. Here, the filter is applied to nanocrystalline 
XRD data. Nanocrystals are characterized by chemical and physical proper- 
ties different from those of the bulk [9]. At a scale of a few nanometers, metals 
can crystallize in a structure different from that of bulk. Nowadays, differ- 
ent branches of science and engineering are benefiting from the properties 
of nanocrystalline materials [10]. In particular, recent XRD experiments have 
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shown that intensities, measured as a function of the scattering angle, could be 
useful to extract structural and domain size information about nanocrystalline 
materials. Synthetic XRD datasets are generated by computing the X-ray scat- 
tered intensity from nanocrystalline samples of different size and properties by 
using an analytic expression (see eq. (10) in Section 3). Synthetic datasets are 
processed and filter performance is studied when considering different levels 
of noise. Numerical tests on real XRD data of Au nanocrystalline samples of 
different size and properties show the robustness of the proposed filter and 
its capability to extract easily and efficiently the useful crystallographic infor- 
mation. These characteristics make this filter an interesting and user-friendly 
tool for the interactive processing of XRD data. 

The paper has the following structure. Section 2 is devoted to the theoretical 
aspects of the proposed approach. The dataset used to study the filter prop- 
erties is described in Section 3. Numerical results are reported in Section 4. 
Finally, some conclusions are drawn in Section 5. 



2 The subspace-based parameter estimation method HSVD 



Let us denote with I n the samples of the diffracted intensity signal collected at 
angles n — 0, . . . , N — 1. They are modelled as the sum of K exponentially 
damped complex sinusoids 

K 

I n = 1° + e n = «fc exp (iif k ) exp [(-d k + i2nf k ) & n ] + e n , (1) 
k=i 



where /„ and 7°, respectively, represent the measured and modelled intensities 
at the n-th scattering angle $ n = nA$ + i? , with At? the sampling angular 
interval and $0 the initial scattering angular position, a k is the amplitude, f k 
the phase, d k the damping factor and f k the frequency of the k-th sinusoid, 
k = 1, . . .,K, with K the number of damped sinusoids, and e n is complex 
white noise with a circular Gaussian distribution. It is worth noting that the 
value of K increases or decreases by 2 in order to guarantee that the modelled 
intensity is real. 
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The N data points denned in (1) are arranged into a Hankel matrix H of 
dimensions Lx M, with L + M = N + 1 and L ~ iV/2 



H 



LxM 



h h 



L-2 



■ ■ ■ Im-i 
... Im 

■ ■ ■ In-i 



(2) 



LxM 



The SVD of the Hankel matrix is computed as 



HlxM — UlxL^LxmVmxMi 



(3) 



where E = diag{Ai, A 2 , . . . , A r }, Ai > A 2 > . . . A r > 0, r = min(L, M), U 
and V are orthogonal matrices and the superscript H denotes the Hermitian 
conjugate. The SVD is computed by using the Lanczos bidiagonalization al- 
gorithm with partial reorthogonalization [11]. This algorithm computes two 
matrix-vector products at each step. Exploiting the structure of the matrix 
(2) by using the FFT, the latter computation requires 0((L+ M)log 2 (L + M)) 
rather than 0(LM). 

In order to obtain the "signal" subspace, the matrix H is truncated to a matrix 
H K of rank K 



(4) 



where Uk, Vk, an d £x are defined by taking the first K columns of U and 
V, and the K x K upper-left matrix of S, respectively. As subsequent step, 
the least-squares solution of the following over-determined set of equations is 
computed 

yjtop) E H _ yjbottom) ^ 



where y£ 6ottom ) anc l yjt ^ are derived from V K by deleting its first and last 
row, respectively. The K eigenvalues Zk of the matrix E are used to estimate 
the frequencies /& and the damping factors dk of the model damped sinusoids 
from the relationship 



z k = exp 



{-d k + i2ixf k ) M 



(6) 
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with k = 1, . . . , K. The values so obtained are inserted into the model eq. (1) 
which yields the set of equations 



K 

I n ^J2 a k ex P (i<Pk) exp (-4 + i2nf k ) $ n 



fc=i 



(7) 



with n — 0, 1, . . . , TV — 1. The least-squares solution of (7) provides the ampli- 
tude cifc and phase ipk estimates of the model sinusoids. 



3 Dataset 



The HLSVD-PRO filter was applied to synthetic as well as real XRD data. 
In this section, the generation of XRD intensity profiles and the experimental 
setup for the acquisition of real data are described. Both synthetic and real 
XRD data refer to Au nanocrystalline samples. Nanocrystals are made of 
clusters of three different structure types: cuboctahedral C, icosahedral X, and 
decahedral V. For each fixed structure type X (X = C,T,V) the size n of 
clusters follows a log-normal distribution 



exp(-s*/2) 

Mn)= vmr x exp 



(logn-log&e) 1 

24 



(8) 



with mode £x and logarithmic width sx- Structural distances for the different 
structure types X are generally studied independently of the actual nano- 
material. The nearest distance between atoms in the crystals is chosen as a 
reference length and arbitrarily set to 1/ v^2, a constant in various structures 
X and for all sizes n of the clusters. Actual distances in nanoclusters are then 
recovered by applying a correction factor ax{n) for strain, supposed to be uni- 
form and isotropic. A convenient description of the strain factor as a function 
of the structure type and cluster size is 

7r + 2atanf^^ 

ax(n) =Qx + (S* - fix) x )— , (9) 

7T + 2 atan 



given in terms of the four parameters [n x , fix, ^x, wx]- Intensities scattered 
by nanoclusters with size n and structure type X are computed by using the 
diffractional model based on the Debye function method [12,13]: 

sin[27iqufj n ax(n)] 
2nqu*fax(n) 




IxM = A { Nx(n) + £ Vn :\ } . (10) 
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where Iq is the incident X-ray intensity, T(q') the Debye- Waller factor, f(q) 
the atomic form factor; A = I [T \q') f (g)] 2 , q = 2af. c .c. sin-$/A and q' = q/cif c c 
are, respectively, the dimensionless and the usual scattering vector length with 
o/.c.c. being the f.c.c. bulk lattice constant; N x {n) is the number of atoms in 
the cluster, u { J n the distance between the i-th and j-th atom, a x {n) the strain 
factor. The total scattered intensity is computed as 

J(?) = E**E/*(")W9). (ii) 

X n=l 



where Sx denotes the size of the largest cluster of type X, xx is the number 
fraction of each structure type (J2x x x — 1)> an d fx{ n ) is the value of log- 
normal size distribution (8). It is worth noting that both intensities in (10) and 
(11) are actually functions of the scattering angle •d being q = 2aj.cc.A~ 1 sini?. 
Experimental XRD intensity profiles are collected by counting, at each scat- 
tering angle i? n , the number of scattered photons giving the diffracted intensity 
signal I n . For such events data are affected by Poisson noise. Since the number 
of photons scattered at each angle i? n is large, the Poisson-distributed noise 
can be approximated by a Gaussian-distributed noise as required in section 2. 

Noisy synthetic XRD intensity profiles were built by generating Poisson dis- 
tributed random profiles with intensity I (11) taken as the mean value of 
the Poisson process. As a measure of the noise level, the noise-to-signal ratio 
(NSR) was defined as follows: 



JV {FI)\\ 

NSR = <\v(Fi)\r (12) 



where V(I) denotes a Poisson process with mean value I. Figure 1 displays 
XRD intensity profiles with increasing NSRs. They were obtained by setting 
A = 0.15418 nm and a/. c . c . = 0.40786 nm in eq. (10). The set of parameters 
used to compute the synthetic profiles are summarized in Table 1. Different 
NSR values were obtained by scaling the scattered intensity (11) by a factor F. 
Figure 2 shows the NSR of the synthetic profiles as a function of the scaling 
factor F ranging from to 2. This range contains the NSR values usually 
measured in experimental profiles 

We also considered real data in order to validate our method. Three different 
samples were prepared with a resultant mean diameter of 2.0, 3.2 and 4.1 nm, 
respectively (as measured by TEM). The size distributions were approximately 
characterized by the same width (FWHM pa 1 nm) for all three systems. 
Powder XRD studies were realized on the XRD beam line at the Brazilian 
Synchrotron Light facility (LNLS-Campinas, Brazil) using 8.040 keV photons 
at room temperature. For further details see Ref . [14] . 
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4 Numerical results 



Noisy synthetic XRD patterns were generated corresponding to nanocrys- 
talline samples of increasing size from 2 to 4 nm, and Poisson-distributed 
noise with increasing NSR from 2% to 10%. The HLSVD-PRO filter was then 
applied to the noisy synthetic XRD signals in order to study its properties 
under controlled conditions. A key step in the filtering procedure is the selec- 
tion of the number K of damped sinusoids characterizing the model function 
of the HLSVD-PRO filter. Here, a possible approach is presented, which is 
based on the following frequency selection criterion: the singular values Xk, 
k = 1, . . . ,r, are plotted vs. the corresponding frequencies fk of the sinusoids 
in eq. (1). This choice facilitates a direct comparison of the results of the pro- 
posed filter with those obtained by other filters based on a frequency approach. 
It was observed that, generally, crystallographic XRD intensity signals show 
a clear transition from a low-frequency region, characterized by high singular 
values Xk, to a high-frequency region with small singular values. The index K 
of the frequency fx corresponding to the transition, provides the number of 
damped sinusoids to be used in the HLSVD-PRO filter. 

Figure 3 displays an example of application of the HLSVD-PRO filter. A 
noisy synthetic XRD intensity profile is shown at the top of the figure. It 
corresponds to X-ray scattering from a Au sample having a 3 nm size with 
a Poisson-like noise with NSR=10%. The filtered signal shown in the middle 
of the figure was obtained by setting K = 9 in the HLSVD-PRO filter. This 
value was estimated by visually inspecting the plot of the singular values Xk 
vs the frequencies fk (see top of fig. (4)). Specifically, a transition from high to 
small Xk was observed at frequency fx = 35 rad -1 , which represents the K th 
frequency in the set of the sorted frequencies starting from the smallest one. 
For a comparison, the discrete Fourier transform (DFT) of the noisy synthetic 
XRD signal is reported at the bottom of fig. (4). Again, a phenomenon of 
transition from high to small singular values occurs in the same region of the 
spectrum, as observed at the top of figure. However, the transition frequency 
is much more difficult to localize than in the HLSVD-PRO filter case. This 
makes troublesome the application of DFT and WF filters to clean noisy XRD 
data. It is worth noting that this difference between the HLSVD-PRO and 
Fourier frequency based filters is relevant when the filter is intended to be used 
during interactive XRD data analysis. In this case the successful application 
of a blackbox filter easy-to-use becomes crucial. 

Coming back to fig. (3), the difference between the values of noisy and filtered 
profiles is shown at the bottom. To quantify the performance of the filter, 
the filtered signal was compared with the noiseless synthetic XRD signal (see 
fig. (5)). This can be done only with synthetic signals as experimental XRD 
data without noise are not available. The filter performance was evaluated 
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using the measure 



£ = 



1 1 hxp 


— hh 1 1 




— hh \ 1 



(13) 



To give a statistical significance to these measures a Monte Carlo experiment 
was carried out. More precisely, the HLSVD-PRO was applied to 1000 noisy 
synthetic profiles generated by starting from the same sample size, with dif- 
ferent NSRs. For each filtered profile, the measure £ of filter performance was 
estimated by calculating the mean value and the standard deviation. Table 
2 summarizes the outcome of the aforementioned Monte Carlo experiments. 
For each sample size and NSR, the mean and standard deviation are obtained 
using 1000 synthetic XRD intensity profiles with different noise realizations 
having the same NSR. The sensitivity to the number K of sinusoids of the 
HLSVD-PRO filter was also studied. This number was slightly varied around 
the optimal K value selected by using the frequency criterion. The perfor- 
mance results were compared in order to validate the choice of the optimal K 
value. In particular, K was increased and decreased by 2, as discussed in Sec- 
tion I. The results of such a comparison are summarized in Table 3 and show 
that the proposed frequency criterion provides the value of K corresponding 
to the best performance of the HLSVD-PRO filter. The filter was also ap- 
plied to real XRD intensity profiles of Au samples of size 2, 3.2 and 4.1 nm. 
Figure 6 shows at top the profile of a 3.2 nm Au sample with NSR = 2.3% 
computed as ||er||/|| J||, where a and / are vectors with the measured error 
and the intensity values, respectively. Since in the case of XRD signals the 
noise follows the Poisson distribution, a is given by \fl. The result obtained 
by HLSVD-PRO is displayed in the middle of the figure. At bottom the plot 
of singular values is depicted vs. the frequency. Components with a frequency 
higher than f K = 34 rad -1 , due to noise, were removed. Denoising a real XRD 
profile of 500 intensity data samples, as typical ones used in the present study, 
requires about 11 seconds, using MatLab 7 on a machine with a Intel Xeon 
1.80 GHz processor and a 512 KB cache size. 



5 Conclusions 



A filter based on the HLSVD-PRO method has been presented. It has been 
applied to filter XRD patterns of nanocluster powders. The filter performance 
has been studied on synthetic and real XRD patterns with different NSRs. 
Results show that the proposed filter is robust and computationally efficient. 
A further advantage is its user-friendliness that makes it a useful blackbox 
tool for the processing of XRD data. 
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Table 1 

Values of parameters used in eq. (10) to compute synthetic XRD intensity profiles. 
The wavelength and the f.c.c. bulk lattice constant were set to A = 0.15418 nm and 
o/.c.c. = 0.40786 nm, respectively. 



Parameter 


X = c 


X =1 


X = V 


& 


5.0 


5.0 


5.0 


sx 


0.3 


0.3 


0.3 


n% 


4.0 


4.0 


6.0 


Q x 


1.0 


1.0 


1.0 


^x 


1.0 


1.0 


1.0 


wx 


0.5 


0.5 


0.5 
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Table 2 

Measure £ of the filter performance (see eq. 13). For each sample size and NSR, 
mean and standard deviation figures refer to Monte Carlo experiments run on 1000 
synthetic XRD intensity profile with different noise realizations having the same 
NSR. 





NSR=2% 


NSR=5% 


NSR=10% 


2 nm 


1.89 ±0.28 


2.34 ±0.18 


2.49 ±0.16 


3 nm 


1.54 ±0.39 


1.87 ±0.16 


2.34 ±0.20 


4 nm 


1.25 ±0.06 


1.56 ±0.12 


1.89 ±0.19 
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Table 3 

Measure £ (see eq. 13) of the filter performance as a function of the order K of the 
filter, namely the cutoff frequency fx- The synthetic XRD intensity data refer to 
different sample sizes and NSRs.The best performance corresponds to the order K 
reported in the middle row of each NSR value. 







2 nm 


3 nm 


4 nm 


NSR=10% 


K-2 


1.86 + 0.16 


1.25 + 0.09 


1.67 + 0.10 


K=9 


2.49 + 0.16 


2.34 + 0.20 


1.89 + 0.19 


K+2 


2.43 ± 0.42 


2.28 + 0.21 


1.73 + 0.18 


NSR=5% 


K-2 


2.17 + 0.18 


1.81 + 0.16 


1.52 + 0.11 


K=ll 


2.34 + 0.18 


1.87 + 0.16 


1.56 + 0.12 


K+2 


2.22 + 0.28 


1.87 + 0.16 


1.48 + 0.09 


NSR=2% 


K-2 


1.80 + 0.21 


1.37 + 0.32 


1.13 + 0.14 


K=15 


1.89 + 0.28 


1.54 + 0.39 


1.25 + 0.06 


K+2 


1.86 + 0.18 


1.46 + 0.38 


1.18 + 0.09 
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« (rad) 

Fig. 1. Synthetic XRD intensity profiles as a function of the scattering angle. Table 
1 summarizes the parameters used in eq. (10) to compute the diffraction intensities. 
They are characteristics of Au samples. The wavelength and bulk lattice constant 
have been set to A = 0.15418 nm and aj. c . c . = 0.40786 nm, respectively. From the 
upper to the lower profile, the NSR increases from 2 to 5% (see fig. 2 and text for 
details). 
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Scaling factor F 



Fig. 2. NSR as a function of the factor F. See text for details. 
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400 



200 - 




1 ' ' 1 1 ' ' 1 1 

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 



5" 400 
< 




-50 1 ' 1 1 1 ' 1 1 1 

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

(rad) 



Fig. 3. 3 nm Au synthetic sample: (top) noisy (NSR=10%) synthetic XRD intensity 
profile as a function of the scattering angle $; (middle) filtered XRD intensity profile. 
The HLSVD-PRO filter removes signal components with frequency above fjc = 35 
rad -1 (see fig. (4)); (bottom) difference between measured and filtered profiles. 



16 




Fig. 4. 3 nm Au synthetic sample with NSR = 10%: (top) amplitude of eigenvalues 
Xk vs. frequency k = 1, ... ,r. The frequency fx = 35 rad -1 was used to sepa- 
rate (filter) high frequency components due to noise; (bottom) Portion of the DFT 
amplitude spectrum of the noisy synthetic XRD intensity profile. Both plots refer 
to the XRD intensity profile shown at the top of fig. (3). 
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Fig. 5. 3 nm Au synthetic sample with NSR = 10%: (top) noiseless synthetic XRD 
intensity profile as a function of the scattering angle (bottom) difference between 
noiseless and filtered (see middle plot of fig. (3)) profiles. 
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Fig. 6. 3.2 nm Au real sample: (top) noisy (NSR=2.3%) XRD intensity profile 
as a function of the scattering angle $; (middle) filtered XRD intensity profile. 
The HLSVD-PRO filter removes signal components with frequency above fjc = 34 
rad -1 ; (bottom) amplitudes of eigenvalues vs frequency k = l,...,q. 
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